منابع مشابه
Differential Eligibility Vectors for Advantage Updating and Gradient Methods
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Specifically, we use DEV in TD-Q(λ) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergen...
متن کاملGradient Convergence in Gradient Methods
For the classical gradient method xt+1 = xt − γt∇f(xt) and several deterministic and stochastic variants, we discuss the issue of convergence of the gradient sequence ∇f(xt) and the attendant issue of stationarity of limit points of xt. We assume that ∇f is Lipschitz continuous, and that the stepsize γt diminishes to 0 and satisfies standard stochastic approximation conditions. We show that eit...
متن کاملGradient Convergence in Gradient methods with Errors
We consider the gradient method xt+1 = xt + γt(st + wt), where st is a descent direction of a function f : �n → � and wt is a deterministic or stochastic error. We assume that ∇f is Lipschitz continuous, that the stepsize γt diminishes to 0, and that st and wt satisfy standard conditions. We show that either f(xt) → −∞ or f(xt) converges to a finite value and ∇f(xt) → 0 (with probability 1 in t...
متن کاملPolicy Gradient Methods
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized control policy by gradient descent. It belongs to the class of policy search techniques that maximize the expected return of a policy in a fixed policy class while traditional value function approximation approaches derive policies from a value function. Policy gradient approaches have various a...
متن کاملSpectral Projected Gradient Methods
The poor practical behavior of (1)-(2) has been known for many years. If the level sets of f resemble long valleys, the sequence {xk} displays a typical zig-zagging trajectory and the speed of convergence is very slow. In the simplest case, in which f is a strictly convex quadratic, the method converges to the solution with a Q-linear rate of convergence whose factor tends to 1 when the conditi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Mathematical Analysis and Applications
سال: 1978
ISSN: 0022-247X
DOI: 10.1016/0022-247x(78)90114-2